A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures

نویسنده

  • Thangavel Alphonse Thanaraj
چکیده

A clean data set of verified splice sites from Homo sapiens are reported as well as the standards used for the clean-up procedure. The sites were validated by: (i) standard cleaning procedures such as requiring consistency in the annotation of the gene structural elements, completeness of the coding regions and elimination of redundant sequences; (ii) clustering by decision trees coupled with analysis of ClustalW alignments of the translated protein sequence with homologous proteins from SWISS-PROT; (iii) matching against human EST sequences. The sites are categorised as: (i) donor sites, a set of 619 EST-confirmed donor sites, for which 138 are either the sites or the regions around the sites involved in alternative splice events; (ii) acceptor sites, a set of 623 EST-confirmed acceptor sites, for which 144 are either the sites or the regions around the sites are involved in alternative splice events; (iii) genuine splice sites, a set of 392 splice sites wherein both the donor and acceptor sites had EST confirmation and were not involved in any alternative splicing; (iv) alternative splice sites, a set of 209 splice sites wherein both the donor and acceptor sites had EST confirmation and the sites or the regions around them were involved in alternative splicing. A set of nucleotide regions that can be used to generate a control set of false splice sites that have a high confidence of being non-functional are also reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Splice Site Detection for Caenorhabditis elegans

We propose a new system for predicting the splice form of Caenorhabditis elegans genes. As a first step we generate a clean set of genes from available exressed sequence tags (EST) and complete complementary (cDNA) sequences. From all such genes we then generate potential acceptor and donor sites as they would be required by any gene finder. This leads to a clean set of true and decoy splice si...

متن کامل

Physical separation of amphiprotic-polar aprotic solvents for simultaneous extraction and clean-up of clomiphene from plasma before liquid chromatographic analyzes

An efficient and quantitative two phase freezing (TPF) method coupled with high performance liquid chromatography and UV-Vis detector was developed for the extraction, clean up and determination of clomiphene citrate (CLC) in plasma samples. The separation of two miscible solvents by TPF method permits that the CLC was efficiently removed from proteins and transferred into the relative aprotic ...

متن کامل

Physical separation of amphiprotic-polar aprotic solvents for simultaneous extraction and clean-up of clomiphene from plasma before liquid chromatographic analyzes

An efficient and quantitative two phase freezing (TPF) method coupled with high performance liquid chromatography and UV-Vis detector was developed for the extraction, clean up and determination of clomiphene citrate (CLC) in plasma samples. The separation of two miscible solvents by TPF method permits that the CLC was efficiently removed from proteins and transferred into the relative aprotic ...

متن کامل

Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes.

We previously reported a computational approach to infer alternative splicing patterns from Mus musculus full-length cDNA clones and microarray data. Although we predicted a large number of unreported splice variants, the general mechanisms regulating alternative splicing were yet unknown. In the present study, we compared alternative exons and constitutive exons in terms of splice-site strengt...

متن کامل

Comprehensive splice-site analysis using comparative genomics

We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nucleic acids research

دوره 27 13  شماره 

صفحات  -

تاریخ انتشار 1999